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Abstract 

This paper aims at a newly raising task in visual surveil¬ 
lance: re-identifying people at a distance by matching body 
information, given several reference examples. Most of ex¬ 
isting works solve this task by matching a reference tem¬ 
plate with the target individual, but often suffer from large 
human appearance variability (e.g. different poses/views, 
illumination) and high false positives in matching caused by 
conjunctions, occlusions or surrounding clutters. Address¬ 
ing these problems, we construct a simple yet expressive 
template from a few reference images of a certain individ¬ 
ual, which represents the body as an articulated assembly of 
compositional and alternative parts, and propose an effec¬ 
tive matching algorithm with cluster sampling. This algo¬ 
rithm is designed within a candidacy graph whose vertices 
are matching candidates (i.e. a pair of source and target 
body parts), and iterates in two steps for convergence, (i) 
It generates possible partial matches based on compatible 
and competitive relations among body parts, (ii) It con¬ 
firms the partial matches to generate a new matching solu¬ 
tion, which is accepted by the Markov Chain Monte Carlo 
(MCMC) mechanism. In the experiments, we demonstrate 
the superior performance of our approach on three public 
databases compared to existing methods. 

1. Introduction 

Person re-identification at a distance increasingly re¬ 
ceives attention in video surveillance, particularly for the 
applications restricting the use of face recognition. But this 
task is very challenging due to the following difficulties, 

• Robust human representation (signature). There are 
large variations for human body in appearance, (e.g., dif¬ 
ferent views, poses, lighting conditions). It is usually in- 
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Figure 1. An illustration of the proposed approach. A query indi¬ 
vidual is represented as a compositional part-based template, and 
part proposals are extracted from multiple instances at each parts. 
Human re-identification is thus posed as compositional template 
matching. Note certain parts are omitted for clear specification. 


tractable to construct a template of the individual to be rec¬ 
ognized by extracting only low-level image features. 

• Effective human matching (localizing). Given the 
template, re-identifying targets with the global body infor¬ 
mation often suffers from high matching false positives, as 
the targets are possibly occluded or conjuncted with others 
and backgrounds in realistic surveillance applications. Fur¬ 
thermore, it is desired to accurately localize human body 
parts in general. 

The objective of human re-identification in this work is 
to recognize an individual by employing body information 
to address the above difficulties. We study the problem with 
the following setting based on the application requirements 
in surveillance: (1) The clothing of individuals remain un¬ 
changed across different scenarios. (2) The individual to 
be re-identified should be in a moderate resolution, (e.g., 
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> 120 pixels in height). Our approach builds a compo¬ 
sitional part-based template to represent the target individ¬ 
ual and matches the template with input images by employ¬ 
ing a stochastic cluster sampling algorithm, as illustrated in 
Fig. 1. 

We organize the template of a query individual with an 
expressive tree representation that can be produced in a 
very simple way. We perform the human body part de¬ 
tectors [1, 2] on several reference images of the individ¬ 
ual, and the images of detected parts are grouped accord¬ 
ing to their semantics. That is, a human template is de¬ 
composed into body parts, e.g., head, torso, arms, each of 
which associates with a number of part instances. Note 
that we can prune the instances sharing very similar ap¬ 
pearances with others. This expressive template fully ex¬ 
ploit information from multiple reference images to capture 
well appearance variability, partially motivated by the re¬ 
cently proposed hierarchical and part-based models in ob¬ 
ject recognition [23, 18, 1 ]. Specifically, several possible 
instances (namely proposals), extracted from different ref¬ 
erences, exist at each part in the template, and we regard this 
representation as the multiple-instance-based compositional 
template (MICT). As a result, new appearance configura¬ 
tions can be composed by the part proposals in the MICT. 
One may question the scalability issue for building such a 
customized template. We argue that the critical concern is 
accurately identifying the target in realistic scenarios, e.g., 
searching for one suspect across scenes, rather than process¬ 
ing numbers of targets at the same time. 

In the inference stage, the body part detectors are ini¬ 
tially utilized to generate possible part locations in the scene 
shot, and human re-identification is then posed as the task 
of part-based template matching. Unlike traditional match¬ 
ing problems, the multiple part proposals in the MICT make 
the search space of matching combinatorially large, as the 
part proposals need to be activated alone with the matching 
process. Handling the false alarms and misdetections by the 
part detectors is also a non-trivial issue during matching. In¬ 
spired by recent studies in cluster sampling [6, 17, 22], we 
propose a stochastic algorithm to solve the compositional 
template matching. 

The matching algorithm is designed based upon the can¬ 
didacy graph, where each vertex denotes a pair of matching 
part proposals, and each edge link represents the contex¬ 
tual interaction (i.e. the compatible or the competitive re¬ 
lation) between two matching pairs. Compatible relations 
encourage vertices to activate together, while competitive 
relations depress conflicting vertices being activated at the 
same time. Specifically, two vertices are encouraged to be 
activated together, as they are kinematically or symmetri¬ 
cally related, whereas two vertices are constrained that only 
one of them can be activated, as they belong to the same part 
type or overlap. The algorithm iterates in two steps for op¬ 


timal matching solution searching, (i) It forms several pos¬ 
sible partial matches (clusters) by turning off the edge links 
probabilistically and deterministically, (ii) It activates clus¬ 
ters to confirm partial matches, leading to a new matching 
solution that will be accepted by the Markov Chain Monte 
Carlo (MCMC) mechanism [( ]. Note that body parts are 
allowed to be unmatched to cope with occlusions. 

The main contributions of this paper are two-fold. 
First, we propose a novel formulation to solve human re¬ 
identification by matching the composite template with 
cluster sampling. Second, we present a new database 
including realistic and general challenges for human re¬ 
identification, which is more complete than existing related 
databases. 

2. Related Work 

In literature, previous works of human re-identification 
mainly focus on constructing and selecting distinctive and 
stable human representation, and they can be roughly di¬ 
vided into the following two categories. 

Global-based methods define a global appearance hu¬ 
man signature with rich image features and match given ref¬ 
erence images with the observations [14, 24, 8]. For exam¬ 
ple, D. Gray et al. propose the feature ensemble to deal with 
viewpoint invariant recognition. Some methods improve the 
performance by extracting features with region segmenta¬ 
tion [15, 25, * ]. Recently, advanced learning techniques are 
employed for more reliable matching metrics [26], more 
representative features [19], and more expressive multi¬ 
valued mapping function [3]. Despite acknowledged suc¬ 
cess, this category of methods often has problems to handle 
large pose/view variance and occlusions. 

Compositional approaches re-identify people by using 
part-based measures. They first localize salient body parts, 
and then search for part-to-part correspondence between 
reference samples and observations. These methods show 
promising results on very challenging scenarios [21], ben¬ 
efiting from powerful part-based object detectors. For ex¬ 
ample, N. Gheissari et al. [12] adopt a decomposable tri¬ 
angulated graph to represent person configuration, and the 
pictorial structures model for human re-identification is in¬ 
troduced [7]. Besides, modeling contextual correlation be¬ 
tween body parts is discussed in [f ]. 

Many works [12, 8, 7] utilize multiple reference in¬ 
stances for individual, i.e. multi-shot approaches, but they 
omit occlusions and conjunctions in the target images and 
re-identify the target by computing a one-to-many distance, 
while we explicitly handle these problems by exploiting re- 
configurable compositions and contextual interactions dur¬ 
ing inference. 

The rest of this paper is organized as follows. We first 
introduce the representations in Section 3, and then discuss 
the inference algorithm in Section 4. The experimental re- 


suits are shown in Section 5, and finally comes the conclu¬ 
sion in Section 6. 

3. Representation 

In this section, we first introduce the definition of 
multiple-instance-based compositional template, and then 
present the problem formulation of human re-identification. 

3.1. Compositional Template 

In this work, we present a compositional template to 
model human with huge variations. 

A human body is decomposed into N = 6 parts: head, 
torso, upper arms, forearms, thighs and calfs, and each limb 
is further decomposed into two symmetrical parts (i.e. left 
and right), as shown in Fig. 2(a)). Each part g is modeled as 
a rectangle and indicated by a 5-tuple (£, x, y , 5 ), where 

t denotes the part type, x and y the part center coordinates, 
0 the part orientation, s the part relative scale, as widely 
employed in pictorial structures model [9, 1]. The multiple- 
instance-based compositional template (MICT) T is defined 
as 

T = {Ti : V = {g} ( 1 ) 

where g denotes a part proposal and the set of proposals 
for the ith part in template. 

Given reference images of an individual, the MICT is 
constructed as follows. 

We first employ body part detectors to scan every refer¬ 
ence image and obtain detection scores for all body parts. 
The training and detecting process of part detectors closely 
follows [2]. Given detection scores, we further prune im¬ 
possible part configurations by several strategies: (i) For all 
parts, the firing detection is pruned if the overlap rate of 
foreground mask (done by background subtraction) is less 
than 75%. (ii) The reference image is segmented into 4 
horizonal strips with equal height. Head is detected in the 
first strip (the first to fourth top to bottom), parts of upper 
body (i.e. torso, upper arms and forearms) in the second, 
and parts of lower body (i.e. thighs and calfs) the rest. Fi¬ 
nally, we apply non-maximum suppression and collect the 
K proposals with highest responses for each part from all 
reference images. 

Given target images (scene shots) to be matched, we can 
obtain the target proposal set G by a similar process as con¬ 
structing the MICT, except the firing detection being pruned 
only by the foreground mask. Considering realistic com¬ 
plexities in surveillance, there probably exist large numbers 
of detection false alarms in the target proposal set G. 

3.2. Candidacy Graph 

Given the template T and the target proposal set G, the 
problem of human re-identification can be posed as the task 
of part-based template matching and solved by two steps: 
(i) activating one proposal for each part in T, (ii) finding 
the match in G. 



(a) (b) 

Figure 2. An illustration of compatible relations, (a) Kinematics 
(navy blue edges) and symmetry (brown edges) relations within 
the compositional template, (b) An example to show how target 
part proposals are coupled together by kinematics and symmetry 
relations. 


We define the set of activated part proposals 4/ from T, 
each of which corresponds to a certain part: 

$ = {^: l(9i) = l, G T t (2) 

The binary label /(•) indicates whether the proposal is acti¬ 
vated or remains inactivated, i.e. /(•) = 1 for activated and 
/(•) = 0 for inactivated. The set of matched part proposals 
from G can be defined as 

$ = { : l($ 4 ) = 1, U {0} (3) 

where 4^ maps the activated proposal of the ith part in T to 
a proposal in G. Note that 4/^ not necessarily has a match 

(i.e. Z(0) = 1), in case the matched part is occluded or 
missed in G. 

To solve these two steps simultaneously, we propose a 
candidacy graph representation and further formulate the 
problem by graph labeling. We define the candidacy graph 
G = <C,E>, where each vertex q E C denotes a candi¬ 
date matching pair (4/^, 4^). A similar binary label Z(q) is 
employed to indicate whether a matching pair q is activated 
or not. Solving the matching problem is equivalent to label¬ 
ing vertices C in the candidacy graph G. The label set L is 
thus defined as 

L = {l(ci ) =k:ke {0,1}, i = 1,|C|,Cj G C}. (4) 

Each edge «= <q, Cj> in G denotes the relation be¬ 
tween two matching pairs c* and Cj. We incorporate two 
kinds of relations, i.e. compatible and competitive relations, 
to model the contextual interactions in scene shots. In the 
following discussion, we drop the notation of edge index ij 
for notation simplicity. 

Compatible relations encourage matching pairs to ac¬ 
tivate together in matching. We represent compatible rela¬ 
tions as how two target part proposals are coupled together 
and mainly explore two cases: (i) kinematics relations for 
coupling kinematic dependent parts, (ii) symmetry relations 
for coupling symmetrical parts. That is, 


















(a) (b) 

Figure 3. A re-identification example used to illustrate our inference algorithm. Given (a) reference images and (b) a scene shot, proposals 
of four parts: head, torso, left thigh, left calf are drawn and numbered in the image. Note that we omit the other parts and only keep few 


proposals for clear specification. 

PticuCj) = 




<U,tj> G Kin, 
<U, tj> G Sym, 


(5) 


where U and tj denotes the part type of <&i and <Fj, respec¬ 
tively. 

(i) Kinematics relations describe spatial relationship 
between kinematic dependent parts (navy blue edges in 
Fig. 2(a)). The spatial distribution Pk(®i, ®j) between two 
proposals <3^ and is modeled as a zero-mean Gaussian 
distribution under the coordinate system of their connected 
joint: 


22 ) 
(24,21) 
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Figure 4. An illustration of the candidacy graph representation. 
We visualize the candidacy graph of Fig. 3. In the graph, vertices 
denote candidate matches, blue and red edges indicate compatible 
and competitive edges between vertices, respectively. 




oc 0, Sji), (6) 

where and F tJ (■) are the transformations of '!>, and 

from image coordinate system to joint coordinate sys¬ 
tem. For detailed explanations, see [9, 1]. 

In the experiment, kinematics relations are learnt from 
reference images with body part annotations. 

(ii) Symmetry relations measure the appearance similar¬ 
ity between symmetrical parts (brown edges in Fig. 2(a)). 
We suppose symmetrical parts from the same individual 
tend to share similar appearance while those from differ¬ 
ent individuals don’t. Therefore the symmetry relations are 
represented as 

a (7) 

where D(-) measures the distance between two part propos¬ 
als and is defined in Equ.(12). 

We give an example to illustrate how kinematics rela¬ 
tions and symmetry relations work in scene shots, as shown 
in Fig. 2(b). Note that we omit certain part proposals for 
clear specification. 

Competitive relations depress conflicting matching 
pairs being activated at the same time. We also develop 
two cases for competitive relations: (i) Two target propos¬ 
als with the same part type cannot be activated simultane¬ 
ously. (ii) The overlapped region between two target part 
proposals should only be compared once. That is, 


Pe ( Ci,Cj ) OC 


ti tn 




, x (8) 

where IoU(4>i, <Fj) indicates the overlap intersection-over- 

union between <1^ and A is a scaling constant. 

An illustration of the candidacy graph representation is 
shown in Fig. 4, corresponding to the example in Fig. 3. 

In summary, the problem of matching the template T to 
the target proposal set G can be represented as 


M = (N U ,N S ,L), 


(9) 


where N u denotes the number of unmatched part pairs and 
N s the number of scales of the activated proposals, and they 
can be computed from the labeling set L. According to 
Bayes’ Rule, M can be solved by maximizing a posterior 
probability: 

M* = arg maxp(M|T, G) 

M 

arg maxp(M|G) 
oc arg maxp(G|M) • p(M ). 

M 

Likelihood p(G\M) measures the appearance similarity 
between the template and the matching target. Assuming 
the appearance similarity of each matching pair is indepen¬ 
dent, then p(G\M) can be factorized into 































Figure 5. An illustration of one transition in composite cluster sampling. The first row and the second row denote labels of part proposals, 
labels of the composite cluster and matching configurations of two successive states (A and B) in one reversible transition, respectively. 


p(G\M) oc p(a\k) = n e > ( n ) 

CiEC li=l, Ci eC 

where D(-) denotes the distance between two proposals. 

We adopt modified HSV color histogram [7] and MSCR 
descriptor [1 1] to describe the visual statistics for each part 
proposal, which has been widely used in existing human re¬ 
identification studies [8, ]. The distance D(-) between two 
arbitrary proposals gi and gj is defined as 

D(gi,gj) = Dbh^Hj) + D M scR{Dci,Dcj), ( 12 ) 

where H denotes the normalized HSV color histogram, 
Dc the MSCR descriptor, D B h{') and D M sc r(-) the Bhat- 
tacharyya distance and the distance defined in [8], respec¬ 
tively. 

Prior p(M) penalizes the undesired activation of match¬ 
ing pairs (e.g. missing parts) and matching inconsistency 
among the activated matching pairs. We define p(M) as 

p(M) = p(N u ).p(N s ).p(L) 

ex exp{-a u N u - a s N s } ■ p(L), 

where a u and a s are corresponding parameters for N u and 
N s , respectively. 

p(L) imposes constraints on the edge links among acti¬ 
vated vertices, that is 

p(i) oc Yl pi( c i’ c i) n 

li~ lj ~ li = ij— 1,6£E 

+ _ (14) 

where E + and E indicate the compatible edges and com¬ 
petitive edges in the candidacy graph G, respectively. 

4. Inference Algorithm 

In a scene shot containing multiple individuals, match¬ 
ing the template to the target becomes an extremely compli¬ 
cated problem. For example, in Fig. 3, the four individuals 


in the shot all share similar appearance with the template. 
As a result, solving Equ.(10) probably leads to a local op¬ 
timal solution. In this case, popular inference algorithms, 
such as EM, Belief Propagation and Dynamic Program¬ 
ming, are easily struck and thus fail to re-identify the correct 
target (i.e. finding global optimal solution), while Compos¬ 
ite Cluster Sampling, as introduced in [17, 22], overcomes 
this problem by jumping from partial coupling matches in 
each MCMC step. Therefore, we employ Composite Clus¬ 
ter Sampling to search for optimal match between the tem¬ 
plate and the correct target. 

Composite Cluster Sampling algorithm consists of the 
following two steps: 

(I) Generating a composite cluster. Given a candidacy 
graph G = <C, E> and the current matching state M, we 
first separate graph edges E into two sets: set of inconsistent 
edges {e G E + : Z(c$) ^ K c j)} U {e G E - : Z(q) = l(cj)} 
(i.e. edges violating current state) and set of consistent 
edges in the other two cases. Next we introduce a boolean 
variable uj e G {1, 0} to indicate an edge is being turned on 
or turned off. We turn off inconsistent edges deterministi¬ 
cally and turn on every consistent edge with its edge prob¬ 
ability p e . Afterwards, we regard candidates connected by 
”on” positive edges as a cluster Cl and collect clusters con¬ 
nected by ”on” negative edges to generate a composite clus¬ 
ter V cc . 

(II) Relabeling the composite cluster. In this step, 
we randomly choose a cluster from the obtained compos¬ 
ite cluster V cc and flip the labels of the selected cluster and 
its conflicting clusters (i.e. the clusters connected with the 
selected cluster), which generates a new state M'. To find a 
better state and achieve a reversible transition between two 
states M and M', the acceptance rate of the transition from 
state M to state M' is defined by a Metropolis-Hastings 
method [20]: 









































= min(l, 


q(M'^M) .p(M'|G) 


), (15) 


q(M^M') -p(M\G) 
where q(M'—>M) and q(M—>M') denote the state tran¬ 
sition probability, p(M'\G) and p(M |G) the posterior de¬ 
fined in Equ.(10). 

Following instructions in [6], the state transition proba¬ 
bility ratio is computed by 

q(M'^M) q(Vcc\M') 
q(M-^M') ^ q(V cc \M) 

^ n eg g+, (! - pi) n e£ £-, (! - pe) 

n e e£+ pi) TleeS~ i 1 ~ Pe) 

+ _ ( 16 ) 

where £ + and £ denote the sets of positive and negative 

edges being turned off around V cc , respectively, that is, 

£ + = {e e E + : Q e V cc , cj g V cc , Z(c*) = l(cj)}, 


£ — {eeE • Ci £ Vca Cj 0 "^ccj Z(q) 7^ Kc?)}* 


(17) 


Note that the subscript of £ + , £ _ in Equ.(16) indicates the 
current state and is omitted for simplicity in the above defi¬ 
nition. 

We show an example of one transition in composite clus¬ 
ter sampling in Fig. 5. In this figure, V cc contains two clus¬ 
ters {C7i, CI 2 }. In state A, Cl\ is activated and the con¬ 
flicting cluster CI 2 is deactivated while in state B labels 
of Cl\ and CI 2 are flipped. The transition from state A 
to state B achieves a fast jump between two kinds of par¬ 
tial coupling matches and coincides with an individual-to- 
individual comparison in re-identification. 

Applying the above mechanism, we summarize the in¬ 
ference algorithm in Algorithm 1. 


Algorithm 1: Re-ID via Composite Cluster Sampling 
Input: MICT T and target proposal set G 
Output: Matching configuration M 

1 Find candidate matching pairs C from T and G\ 

2 Compute compatible and competitive relations E 
among C; 

3 Construct candidacy graph G =< C, E >; 

4 Initialize M as N u = N s = 0, MU eL,lj = 0 (i.e. 
all candidate matching pairs inactivated); 

5 repeat 

6 Generate a composite cluster V cc ; 

7 Relabeling the composite cluster V cc to generate 
a new state M’\ 

8 Accept the new state M' with acceptance rate 
a(M-»M'); 

9 until M converges ; 


5. Experiments 

In this section, we first introduce the datasets and the 
parameter settings, and then show our experimental results 
as well as component analysis of the proposed approach. 


5.1. Datasets and Settings 

We validate our method on three public databases as fol¬ 
lows. 

(i) VIPeR dataset 1 . It is commonly used for human re¬ 
identification, containing 632 people in outdoor, and there 
are 2 images for each individual. 

(ii) EPFL dataset 2 . This database is very challenging, 
originally proposed for tracking in multi-views [10]. It con¬ 
sists of 5 different scenarios that are filmed by three or four 
cameras from different angles. For evaluating our method, 
we extract individuals from the original videos and anno¬ 
tate each of them with ID and location (bounding box). In 
total, there are 70 reference images for 30 different individ¬ 
uals, (normalized to 175 pixels in height), and 80 shots in 
360 x 288, which contain 294 targets to be re-identified. 

(iii) CAMPUS-Human dataset 3 . We construct this 
database including general and realistic challenges for peo¬ 
ple re-identification in surveillance. There are 370 refer¬ 
ence images normalized to 175 pixels in height, for 74 in¬ 
dividuals, with IDs and locations provided. We present 214 
shots containing 1519 targets for evaluating methods, and 
the targets often appear with diverse poses/views, conjunc¬ 
tions and occlusions, see Fig. 7 (bottom row). Note all im¬ 
ages in both EPFF dataset and CAMPUS-Human dataset 
are captured from the original videos with large time gap to 
guarantee appearance varieties (unlike ETHZ dataset [24]). 

Experiment settings. For VIPeR dataset, we adopt the 
common setting that running the algorithm on random par¬ 
titions containing 316 pairs. For EPFF and CAMPUS- 
Human dataset, we randomly select reference images for 
each individual, and all target images are tested to match. 
The results on all three datasets are computed by taking av¬ 
erage over ten runs. Our approach is evaluated under cases 
of both single reference image (single-shot, SvsS) and mul¬ 
tiple reference images (multi-shot, MvsS, M — 2,3). 

All the parameters are fixed in the experiments, includ¬ 
ing A = 10 for scaling the overlap IoU, a u = 12 and 
a s = 3 for penalizing the activation of vertices. We con¬ 
struct the MICT for each individual with their selected ref¬ 
erence images. In the re-identification, a number K of body 
part proposals are generated. In practice, we set K approx¬ 
imately 3 times the number of individuals in the shot. 

We implement our approach with C++ and run the pro¬ 
gram on a PC with 15 2.8GHZ CPU and 4GB memory. On 
average, the inference algorithm converges after around 500 
samplings, which costs 2s ~ 40s. The time cost is related 
with the complexity of the candidacy graph. 


Available at www.umiacs.umd.edu/~schwartz/datasets.html 

2 Available at cvlab.epfl.ch/data/pom/ 

3 Available at http://vision.sysu.edu.cn/projects/human-reid/ 
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Figure 6. Performance comparisons using the CMC curves on VIPeR (leftl), EPFL (left2), CAMPUS-Human (rightl and right2) datasets. 
EPFL and CAMPUS-Human datasets are evaluated in both single-shot and multi-shot cases. 


5.2. Experimental Results 

We compare our approach with the state-of-the-arts 
methods: Pictorial Structures (PS) [1], View-based Pic¬ 
torial Structures (VPS) [2], Custom Pictorial Structures 
(CPS) [7], Symmetry-driven Accumulation of Local Fea¬ 
tures (SDALF) [8] and Ensemble of Localized Features 
(ELF) [14]. We adopt the provided code of PS and imple¬ 
ment VPS and CPS according to their descriptions. For fair 
comparison, the same likelihood is employed for PS, VPS 
and CPS as the proposed method. The results are evaluated 
by two ways: (i) re-identifying individuals in segmented 
images, i.e. targets already localized, and (ii) re-identifying 
individuals from scene shots without provided segmenta¬ 
tions. 

For the first evaluation, we adopt the cumulative match 
characteristic (CMC) curve for quantitative analysis, as in 
previous works [13, 24]. The curve reflects the overall 
ranked matching rates; precisely, a rank r matching rate 
indicates the percentage of correct matches found in top r 
ranks. As Fig. 6 shows, we demonstrate the superior per¬ 
formance over the competing approaches in both single¬ 
shot case and multi-shot case. And our method yields 
the best rank 1 matching rate on EPFL and CAMPUS- 
Human datasets. We observe that the performance of re¬ 
identification can be improved significantly by fully exploit¬ 
ing reconfigurable compositions and contextual interactions 
in inference. Our performance only improves slightly on 
VIPeR dataset, as most erroneous matchings are due to se¬ 
vere illumination changes, which has been approved in [8]. 

The second test is stricter, since the algorithms should 
also localize the target during re-identification. We adopt 

Table 1. Matching rate of re-identifying targets in scene shots with¬ 
out provided segmentations. 


Dataset 

EPFL 

CAMPUS-Human 

Our M=2 

57/294 

215/1519 

VPS M=2 

54/294 

175/1519 

PS M=2 

32/294 

141/1519 

Our Single 

50/294 

173/1519 

VPS Single 

49/294 

139/1519 

PS Single 

24/294 

118/1519 


the PASCAL Challenge criterion to evaluate the localization 
results: a match is counted as the correct match only if the 
intersection-over-union ratio (IoU) with the groundtruth 
bounding box is greater than 50%. We compare our method 
with PS [1], VPS [2], which can localize the body at the 
same time as localizing the parts. The quantitative results 
are reported in Table 1 . A number of representative results 
generated by our method are exhibited in Fig. 7. From the 
results, existing methods perform poor when individuals are 
not well segmented and scaled to uniform size. In contrast, 
our method can re-identify challenging target individuals by 
searching and matching their salient parts and thus achieves 
better performance. Note the performance of our approach 
also drops significantly due to inaccurate part localizations 
and interference of other individuals. 

Component Analysis. We further analyze component 
benefits of our approach on CAMPUS-Human dataset un¬ 
der the setting: multi-shot M — 3. Regarding feature effec¬ 
tiveness, we separately evaluate different image features, as 
shown in Fig. 8 (left). It is apparent that the combined fea¬ 
ture improves the result. We also demonstrate the effec¬ 
tiveness of the constraints employed, and Fig. 8 (right) con¬ 
firms that both kinematics and symmetry constraints help 
construct better matching solution. 


Features comparison Constraints comparison 



Figure 8. Empirical studies on different features (left) and con¬ 
straints (right) used in our approach on CAMPUS-Human dataset. 


6. Conclusion 

This paper studies a novel compositional template for 
human re-identification, in the form of an expressive 
multiple-instance-based compositional representation of the 
query individual. By exploiting reconfigurable compo¬ 
sitions and contextual interactions during inference, our 
method handles well challenges in human re-identification. 

























Figure 7. Results generated by our approach on EPFL and CAMPUS-Human datasets. In each result, the query individual is specified by 
the image beside the shot. Green boundings denote the target groundtruth location, while red boundings are generated by algorithm. 


Moreover, we will explore more robust and flexible part rep¬ 
resentations and better inter-part relations in future works. 
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